Skip to content

otel: replace cobraotel with native lifecycle management#3108

Merged
miparnisari merged 6 commits into
authzed:mainfrom
Jdepp007004:fix/otel-lifecycle-native
May 26, 2026
Merged

otel: replace cobraotel with native lifecycle management#3108
miparnisari merged 6 commits into
authzed:mainfrom
Jdepp007004:fix/otel-lifecycle-native

Conversation

@Jdepp007004
Copy link
Copy Markdown
Contributor

@Jdepp007004 Jdepp007004 commented May 9, 2026

Description

Closes #3095 and #712

Testing

  1. docker-compose up --build, then send a few requests to the server and go to localhost:3000. You should be able to see traces
  2. shutdown the docker containers. You should see this log: "shutting down OTel provider"

@Jdepp007004 Jdepp007004 requested a review from a team as a code owner May 9, 2026 14:47
@github-actions github-actions Bot added area/cli Affects the command line area/dependencies Affects dependencies area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels May 9, 2026
@Jdepp007004 Jdepp007004 closed this May 9, 2026
@Jdepp007004 Jdepp007004 reopened this May 9, 2026
@Jdepp007004 Jdepp007004 changed the title removes vendor directory from tracking otel: replace cobraotel with native lifecycle management May 9, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators May 9, 2026
@authzed authzed unlocked this conversation May 13, 2026
Copy link
Copy Markdown
Contributor

@tstirrat15 tstirrat15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Comment thread pkg/cmd/server/otel.go Outdated
Comment thread pkg/cmd/server/otel.go Outdated
Comment thread pkg/cmd/server/otel.go Outdated
Comment thread pkg/cmd/server/otel.go Outdated
Comment thread pkg/cmd/server/otel.go Outdated
Comment thread pkg/cmd/util/util.go Outdated
Comment on lines +441 to +442
func RegisterCommonFlags(cmd *cobra.Command) {
otel := cobraotel.New("spicedb")
otel.RegisterFlags(cmd.Flags())
f := cmd.Flags()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, glad it reads better there

Copy link
Copy Markdown
Contributor

@miparnisari miparnisari May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in RegisterOTelFlags looks awfully similar :)

I think this code can be deleted. None of the other commands (other than spicedb serve) use the otel flags.

EDIT: i've gone ahead and deleted it.

Comment thread pkg/cmd/serve.go Outdated
Comment thread pkg/cmd/server/otel_integration_test.go Outdated
Comment on lines +22 to +25
cmd.SetContext(context.Background())
require.NoError(t, cmd.Flags().Set("otel-provider", "otlpgrpc"))
require.NoError(t, cmd.Flags().Set("otel-endpoint", "localhost:4317"))
require.NoError(t, cmd.Flags().Set("otel-insecure", "true"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally like the tests. Can we add a test or two that establishes 1. that you can use the otel environment variables to configure flags that aren't set and 2. which value overrides which if you declare both?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added TestOTelConfig_EnvVarConfiguresUnsetFlag and TestOTelConfig_ExplicitFlagOverridesEnvVar to the integration test file. First one verifies the SDK picks up OTEL_EXPORTER_OTLP_ENDPOINT when the endpoint flag is not explicitly set, second one documents that an explicit flag value takes precedence over the env var.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these tests because they weren't really asserting anything useful

Copy link
Copy Markdown
Contributor

@miparnisari miparnisari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a new struct within

type Config struct {

called otelConfig or something like that, so that anyone doing server.NewConfigWithOptionsAndDefaults can set these programmatically?

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 82.35294% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.02%. Comparing base (49ef12e) to head (ccb9fda).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/cmd/server/otel.go 86.03% 10 Missing and 3 partials ⚠️
pkg/cmd/server/server.go 50.00% 2 Missing and 1 partial ⚠️
pkg/cmd/serve.go 33.34% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3108      +/-   ##
==========================================
+ Coverage   75.99%   76.02%   +0.04%     
==========================================
  Files         566      567       +1     
  Lines       64277    64370      +93     
==========================================
+ Hits        48840    48930      +90     
- Misses      11764    11767       +3     
  Partials     3673     3673              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jdepp007004
Copy link
Copy Markdown
Contributor Author

Addressing miparnisari's review — added OTelConfig as a struct in the server package with an OTel field on Config so it can be set programmatically without going through Cobra flags. InitOTelProvider now takes that struct directly along with a context and has no Cobra dependency at all.

Fixes authzed#712 and authzed#3095.

- Remove dependency on github.com/jzelinskie/cobrautil/v2/cobraotel
- Replicate OTel provider initialization natively in pkg/cmd/server/otel.go
- Wire TracerProvider into serve.go signal handler so Shutdown and
  ForceFlush are called on SIGTERM/SIGINT, preventing span loss on exit
- Fix vendored cobrautil Viper global singleton bug: viper.SetEnvPrefix
  was mutating global state instead of the local instance (v.SetEnvPrefix)
- Touch pkg/cmd/util/util.go only to break import cycle between
  pkg/cmd/util and pkg/cmd/server; all flag registrations unchanged
- Add 20 tests across unit, integration, and system build tags
- use ctxkey package for context key instead of a bare struct type
- drop legacy otel-jaeger-* flags
- collapse ShutdownOTelProvider to a single context shared across
  ForceFlush and Shutdown
- move OTel initialization out of the Cobra layer into Config.Complete
  via an OTelConfig struct, matching the pattern used by other components
- register provider shutdown with closeables so it participates in the
  ordered server shutdown sequence
- add env var precedence tests
@miparnisari miparnisari force-pushed the fix/otel-lifecycle-native branch from edc506d to 7a2514c Compare May 25, 2026 23:51
Comment thread pkg/cmd/server/otel_integration_test.go Outdated
Copy link
Copy Markdown
Contributor

@miparnisari miparnisari May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need three different files for tests?

EDIT: I've simplified this

miparnisari
miparnisari previously approved these changes May 26, 2026
Copy link
Copy Markdown
Contributor

@miparnisari miparnisari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but i'm biased because I made changes 😄

Comment thread docs/spicedb.md
Comment on lines -126 to -131
--otel-endpoint string OpenTelemetry collector endpoint - the endpoint can also be set by using enviroment variables
--otel-insecure connect to the OpenTelemetry collector in plaintext
--otel-provider string OpenTelemetry provider for tracing ("none", "otlphttp", "otlpgrpc") (default "none")
--otel-sample-ratio float ratio of traces that are sampled (default 0.01)
--otel-service-name string service name for trace data (default "spicedb")
--otel-trace-propagator string OpenTelemetry trace propagation format ("b3", "w3c", "ottrace"). Add multiple propagators separated by comma. (default "w3c")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - serve still has them. All the other commands don't use these values.

tstirrat15
tstirrat15 previously approved these changes May 26, 2026
Copy link
Copy Markdown
Contributor

@tstirrat15 tstirrat15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment, otherwise LGTM

Comment thread pkg/cmd/server/otel.go
@miparnisari miparnisari dismissed stale reviews from tstirrat15 and themself via ccb9fda May 26, 2026 16:58
@miparnisari miparnisari enabled auto-merge (squash) May 26, 2026 16:59
Copy link
Copy Markdown
Contributor

@tstirrat15 tstirrat15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@miparnisari miparnisari merged commit af55756 into authzed:main May 26, 2026
44 of 45 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area/cli Affects the command line area/dependencies Affects dependencies area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: OpenTelemetry traces are not flushed on shutdown, dropping spans on SIGTERM

3 participants